SlideShare a Scribd company logo
1 of 26
Scalable Identifiers for Natural
      History Collections
                12 August 2012

    University of California Curation Center
           California Digital Library
California Digital Library
Serving the University of California
• 10 campuses
• 360K students, faculty, and staff
• 100’s of museums, art                CDL supports the research
   galleries, observatories, marine      lifecycle
   centers, botanical gardens          • Collections
• 5 medical centers                    • Digital Special Collections
• 5 law schools                        • Discovery & Delivery
• 3 National Labs                      • Publishing Group
                                       • UC Curation Center (UC3)
The research data problem
an article about data, but no data
What EZID data citation offers
•   Precise identification of a dataset (DOI, ARK)
•   Credit to data producers and data publishers
•   A link from traditional literature to the data
•   Exposure and research metrics for datasets
    (Web of Knowledge, Google)
EZID: Long term identifiers made easy




                        Take control of the
                        management and distribution
                        of your research, share and get
                        credit for it, and build your
                        reputation through its collection
                        and documentation
EZID: Long term identifiers made easy




                        Take control of the
                        management and distribution
                        of your research, share and get
                        credit for it, and build your
                        reputation through its collection
                        and documentation
DataCite
German National Library of Economics (ZBW)

German National Library of Science and Technology (TIB)
                                                             Canada Institute for Scientific and Technical Info. (CISTI)
German National Library of Medicine (ZB MED)
                                                             Technical Information Center of Denmark
GESIS - Leibniz Institute for the Social Sciences, Germany
                                                             Institute for Scientific & Technical Information (INIST-
Australian National Data Service (ANDS)
                                                                 CNRS), France
ETH Zurich, Switzerland
                                                             TU Delft Library, The Netherlands

                                                             The Swedish National Data Service (SNDS)

                                                             The British Library , UK

                                                             California Digital Library (CDL), USA
                                                             Office of Scientific & Technical Information (OSTI), USA

                                                             Purdue University Library
EZID Clients
                                        A current, partial list

UC Berkeley Library (on behalf of the UC Berkeley     The Digital Archaeological Record (tDAR)
campus) Sponsored accounts:

      Open Context                                    Dryad Digital Repository
      CRCNS.org
UC San Diego Library (on behalf of the UC San Diego   Fred Hutchinson Cancer Research Center
campus)

American Astronomical Society(AAS)                    LabArchives
Centre national de documentation                      National Center for Atmospheric Research
                                                      (NCAR)
pédagogique(CNDP)
Cornell Institute for Social & Economic               USGS/Earth Sciences Data Clearinghouse
Research                                              (formerly National Biological Info. Infrastructure)
New features in development

• Suffix pass-thru: do NT and get N/ST/S for free
• Service replicas: manager and resolver
• Content negotiation and inflections: ? ?? / .
• URN (Uniform Resource Name) support (urn:uuid:)
• ARK community and governance, eg, registries
Some identifier dimensions
• registration (storing and updating ids for
  resolution)
• non-registration (id awareness via rules)
• persistence flavors
• resolution
• clusters (closely coupled ids)
• other relations (part, whole, related)
Identifier generation
• inspiration ("I think I'll call it MyKitty/Photos")
• systematic inspiration (title/author/vol/issue)
• counter (421, 422, 423, ...)
• timestamp
• hash computed over content (MD5, SHA256)
• hash of randomized timestamp plus registry
  (uuidgen, noid)
• randomized counter plus registry (EZID/noid)
Identifier registration
• use filesystem tree as resolver (any old
  website)
• use web server config file
• use web server backing database
• use a service (bit.ly, EZID, DataCite, local
  Handle service)
Identifier non-registration
Identifiers “exposed” but not registered, eg,
  awareness via rules
• extension (abc/def is "part of" abc)
• parameter (abc_N_M works for N or M less
  than 100,000)
• general query (arbitrary data cells)
Identifier persistence flavors
• persistent id to very dynamic content
  (eg, home page)
• persistent id to stable but correctable content
  (eg, landing page)
• persistent id to never-changing content
  (eg, spreadsheet)
  – persistent ids to non-recommended content
• persistent id to stable but growing content
  (serial pub)
Identifier resolution
• DNS (domain names)
• DNS + HTTP (any website)
• DNS + HTTP + redirects (eg, URL
  shorteners, N2T/EZID system)
• DNS + HTTP + redirects + Handle resolver
  (DOIs and Handles)
Identifier clusters
Related, but very closely couple identifiers
• object files
• alternate object files
• object metadata
GUID Definitions
• GUID -- Definition 1 (wikipedia)
  – A 128-bit id generated per RFC 4122, eg,
  – uuidgen -> EEF45689-BBE5-4FB6-9E80-
    41B78F6578E2
• GUID -- definition 2 (earth sciences?)
  – any globally unique identifier
Service replicas
• EZID is an id manager that populates N2T
   – It tolerates down time
   – Other id manager services might one day populate N2T
• N2T (Name-to-Thing) is an id resolver that ...
   – It is very intolerant of down time, since it services all
     access requests for locations and metadata
   – N2T replicas underway
URN support
• N2T and EZID are agnostic about kinds of
  things, names, and metadata
   – Digital, physical, abstract, living, fictional, groups, etc.
   – Any metadata & known profiles (DataCite, Dublin Kernel)
   – ARK, DOI, URN, Handle, IVOA, LSID, PMID, etc., requiring
     namespace “write” permission, eg, via DataCite
• In test: Uniform Resource Names (URNs)
   – urn:uuid namespace
Under the hood keysmithing terms:
bows, shoulders, blades, tips, covers
Suffix pass-thru: NT gets N/ST/S for free

Idea: if name N points to target T, then requests for N
  extended by any suffix N/S can take you to T/S
• For dataset doi:10.5072/Big4 with 10,000
  nameable components,
   – Register and manage 10,001 names or 1 name?
   – Eg, http://x.y.z/foo/Big4/db/table/cell/45-8.txt could be
     reached with doi:1.5072/Big4/table/cell/45-8.txt
• In test with ARKs. Conflict with other resolvers?
Tombstone and other surrogate pages

Tombstone, incubation, and other surrogate pages
  (probation?) auto-generated from metadata, eg,
  http://n2t.net/ezid/tombstone/id/ark:/20775/bb3243444z
Reserved identifiers and multiple targets

• Some ids must be created and managed (reserved)
  before going public, eg, for manuscript preparation
• In test: infrastructure for multiple targets and
  multiple instances of any metadata element
• What should user experience be for multiple targets?
   – Present a menu of targets (burden of choice)?
   – One target chosen for them (burden of inflexibility)?
Identifier (ARK) inflections: ? ?? / .

• Inflect: change endings w.o. creating new words
  – Terminal ? means “I want metadata”, which is similar to
    linked data content negotiation (also in EZID test)
  – Terminal ?? means “I also want support metadata”
  – Drawing board: / could mean “I want a landing page”
    and . could mean “I want the usual computable thing”
• Allow inflections beyond ARKs to DOIs/URNs?
Example: http://n2t.net/ark:/13030/qt0349g1rh?
        Renninger, Heidi; Phillips, Nathan; Hodel, Donald. “Comparative hydraulic and
            anatomic properties in palm trees (Washingtoniarobusta) of varying
            heights”. 2009-04-29. ark:/13030/qt0349g1rh



     HTML content with
     embedded comments in
     ANVL/ERC and RDF



erc:
who: Renninger, Heidi,; Phillips,
   Nathan,; Hodel, Donald,
what: Comparative hydraulic and
   anatomic properties in palm
   trees (Washingtoniarobusta)
   of varying heights
when: 2009-04-29
where: ark:/13030/qt0349g1rh
ARK community and governance

•   ARK mailing list: arks-forum@googlegroups.com
•   Topics: governance, community, standardization
•   Registry maintenance: shoulders and NAANs
•   N2T consortium with alternative EZID-like services

More Related Content

What's hot

The Experimental Project of DOI Registration for Research Data at Japan Link...
The Experimental Project of DOI Registration for Research Data at Japan Link...The Experimental Project of DOI Registration for Research Data at Japan Link...
The Experimental Project of DOI Registration for Research Data at Japan Link...National Institute of Informatics (NII)
 
University of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchersUniversity of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchersJez Cope
 
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...National Institute of Informatics (NII)
 
Digital library software
Digital library softwareDigital library software
Digital library softwareavid
 
Toward universal information access on the digital object cloud
Toward universal information access on the digital object cloudToward universal information access on the digital object cloud
Toward universal information access on the digital object cloudNational Institute of Informatics
 
Riding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessdatacite
 
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...OpenAIRE
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identificationguest453b14
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and LibariesRob Grim
 
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...GigaScience, BGI Hong Kong
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"GigaScience, BGI Hong Kong
 
Research Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering StudentsResearch Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering StudentsAaron Collie
 

What's hot (14)

The Experimental Project of DOI Registration for Research Data at Japan Link...
The Experimental Project of DOI Registration for Research Data at Japan Link...The Experimental Project of DOI Registration for Research Data at Japan Link...
The Experimental Project of DOI Registration for Research Data at Japan Link...
 
University of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchersUniversity of Bath Research Data Management training for researchers
University of Bath Research Data Management training for researchers
 
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
 
Digital library software
Digital library softwareDigital library software
Digital library software
 
Toward universal information access on the digital object cloud
Toward universal information access on the digital object cloudToward universal information access on the digital object cloud
Toward universal information access on the digital object cloud
 
Riding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information access
 
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identification
 
Digital Library Software
Digital Library SoftwareDigital Library Software
Digital Library Software
 
Identifying psychological research data in the digital environment.
Identifying psychological research data in the digital environment. Identifying psychological research data in the digital environment.
Identifying psychological research data in the digital environment.
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
 
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
Alexandra Basford, InCoB 2011: A Journal’s Perspective on Data Standards and ...
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
Research Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering StudentsResearch Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering Students
 

Viewers also liked

Annotating Research Datasets
Annotating Research DatasetsAnnotating Research Datasets
Annotating Research DatasetsJohn Kunze
 
Media literacy for the information professional
Media literacy for the information professionalMedia literacy for the information professional
Media literacy for the information professionalBarbara Devilee
 
YAMZ: a cross-domain crowd-sourced metadata vocabulary
YAMZ: a cross-domain crowd-sourced metadata vocabularyYAMZ: a cross-domain crowd-sourced metadata vocabulary
YAMZ: a cross-domain crowd-sourced metadata vocabularyJohn Kunze
 
Putting the IFLA Media & Information Literacy Recommendations into practice i...
Putting the IFLA Media & Information Literacy Recommendations into practice i...Putting the IFLA Media & Information Literacy Recommendations into practice i...
Putting the IFLA Media & Information Literacy Recommendations into practice i...Sheila Webber
 
Media and Information Literacy - A Thai Netizen perspective
Media and Information Literacy - A Thai Netizen perspectiveMedia and Information Literacy - A Thai Netizen perspective
Media and Information Literacy - A Thai Netizen perspectiveThai Netizen Network
 
Information literacy in a media-saturated world
Information literacy in a media-saturated worldInformation literacy in a media-saturated world
Information literacy in a media-saturated worldPam Wilson
 
Media and Information Literacy: strength through diversity
Media and Information Literacy: strength through diversityMedia and Information Literacy: strength through diversity
Media and Information Literacy: strength through diversitySheila Webber
 

Viewers also liked (7)

Annotating Research Datasets
Annotating Research DatasetsAnnotating Research Datasets
Annotating Research Datasets
 
Media literacy for the information professional
Media literacy for the information professionalMedia literacy for the information professional
Media literacy for the information professional
 
YAMZ: a cross-domain crowd-sourced metadata vocabulary
YAMZ: a cross-domain crowd-sourced metadata vocabularyYAMZ: a cross-domain crowd-sourced metadata vocabulary
YAMZ: a cross-domain crowd-sourced metadata vocabulary
 
Putting the IFLA Media & Information Literacy Recommendations into practice i...
Putting the IFLA Media & Information Literacy Recommendations into practice i...Putting the IFLA Media & Information Literacy Recommendations into practice i...
Putting the IFLA Media & Information Literacy Recommendations into practice i...
 
Media and Information Literacy - A Thai Netizen perspective
Media and Information Literacy - A Thai Netizen perspectiveMedia and Information Literacy - A Thai Netizen perspective
Media and Information Literacy - A Thai Netizen perspective
 
Information literacy in a media-saturated world
Information literacy in a media-saturated worldInformation literacy in a media-saturated world
Information literacy in a media-saturated world
 
Media and Information Literacy: strength through diversity
Media and Information Literacy: strength through diversityMedia and Information Literacy: strength through diversity
Media and Information Literacy: strength through diversity
 

Similar to Scalable Identifiers for Natural History Collections

RDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management EcosystemRDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management EcosystemASIS&T
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management EcosystemJohn Kunze
 
DataCite: the Perfect Complement to CrossRef
DataCite: the Perfect Complement to CrossRefDataCite: the Perfect Complement to CrossRef
DataCite: the Perfect Complement to CrossRefCrossref
 
IASSIST identifiers By Joan Starr
IASSIST identifiers By Joan StarrIASSIST identifiers By Joan Starr
IASSIST identifiers By Joan StarrCarly Strasser
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentationekansa
 
Measuring Science – Tracing the authors
Measuring Science – Tracing the authorsMeasuring Science – Tracing the authors
Measuring Science – Tracing the authors Andrea Scharnhorst
 
Experiences (mis)managing archaeological data
Experiences (mis)managing archaeological dataExperiences (mis)managing archaeological data
Experiences (mis)managing archaeological datadata_management
 
Supporting Data-Rich Research on Many Fronts
Supporting Data-Rich Research on Many FrontsSupporting Data-Rich Research on Many Fronts
Supporting Data-Rich Research on Many FrontsJohn Kunze
 
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI PresentationOpen Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentationekansa
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"GigaScience, BGI Hong Kong
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsJon Voss
 
Intro to Digitization Projects
Intro to Digitization ProjectsIntro to Digitization Projects
Intro to Digitization Projectszsrlibrary
 
Managing provenance in the Social Sciences: the Data Documentation Initiative...
Managing provenance in the Social Sciences: the Data Documentation Initiative...Managing provenance in the Social Sciences: the Data Documentation Initiative...
Managing provenance in the Social Sciences: the Data Documentation Initiative...ARDC
 
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...aceas13tern
 
Metadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data RepositoriesMetadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data Repositoriesandrea huang
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeGigaScience, BGI Hong Kong
 

Similar to Scalable Identifiers for Natural History Collections (20)

Dataset Metadata, Tools and Approaches for Access and Preservation
Dataset Metadata, Tools and Approaches for Access and PreservationDataset Metadata, Tools and Approaches for Access and Preservation
Dataset Metadata, Tools and Approaches for Access and Preservation
 
RDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management EcosystemRDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management Ecosystem
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management Ecosystem
 
DataCite: the Perfect Complement to CrossRef
DataCite: the Perfect Complement to CrossRefDataCite: the Perfect Complement to CrossRef
DataCite: the Perfect Complement to CrossRef
 
Researh data management
Researh data managementResearh data management
Researh data management
 
IASSIST identifiers By Joan Starr
IASSIST identifiers By Joan StarrIASSIST identifiers By Joan Starr
IASSIST identifiers By Joan Starr
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentation
 
Measuring Science – Tracing the authors
Measuring Science – Tracing the authorsMeasuring Science – Tracing the authors
Measuring Science – Tracing the authors
 
Data Publishing in Archaeozoology
Data Publishing in ArchaeozoologyData Publishing in Archaeozoology
Data Publishing in Archaeozoology
 
Experiences (mis)managing archaeological data
Experiences (mis)managing archaeological dataExperiences (mis)managing archaeological data
Experiences (mis)managing archaeological data
 
Supporting Data-Rich Research on Many Fronts
Supporting Data-Rich Research on Many FrontsSupporting Data-Rich Research on Many Fronts
Supporting Data-Rich Research on Many Fronts
 
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI PresentationOpen Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
 
Jan Brase: Data and Libraries - the DataCite consortium
Jan Brase: Data and Libraries - the DataCite consortiumJan Brase: Data and Libraries - the DataCite consortium
Jan Brase: Data and Libraries - the DataCite consortium
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
 
Intro to Digitization Projects
Intro to Digitization ProjectsIntro to Digitization Projects
Intro to Digitization Projects
 
Managing provenance in the Social Sciences: the Data Documentation Initiative...
Managing provenance in the Social Sciences: the Data Documentation Initiative...Managing provenance in the Social Sciences: the Data Documentation Initiative...
Managing provenance in the Social Sciences: the Data Documentation Initiative...
 
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
 
Metadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data RepositoriesMetadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data Repositories
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data deluge
 

More from John Kunze

The YAMZ Metadictionary
The YAMZ MetadictionaryThe YAMZ Metadictionary
The YAMZ MetadictionaryJohn Kunze
 
YAMZ Metadata Vocabulary Builder
YAMZ Metadata Vocabulary BuilderYAMZ Metadata Vocabulary Builder
YAMZ Metadata Vocabulary BuilderJohn Kunze
 
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...John Kunze
 
EZID and N2T at CDL
EZID and N2T at CDLEZID and N2T at CDL
EZID and N2T at CDLJohn Kunze
 
YAMZ.net: better, faster, cheaper taxonomy building
YAMZ.net:  better, faster, cheaper taxonomy buildingYAMZ.net:  better, faster, cheaper taxonomy building
YAMZ.net: better, faster, cheaper taxonomy buildingJohn Kunze
 
A Vocabulary for Persistence
A Vocabulary for PersistenceA Vocabulary for Persistence
A Vocabulary for PersistenceJohn Kunze
 
Identifiers obey Resolvers not Schemes
Identifiers obey Resolvers not SchemesIdentifiers obey Resolvers not Schemes
Identifiers obey Resolvers not SchemesJohn Kunze
 
Names, Things, and Open Identifier Infrastructure: N2T and ARKs
Names, Things, and Open Identifier Infrastructure: N2T and ARKsNames, Things, and Open Identifier Infrastructure: N2T and ARKs
Names, Things, and Open Identifier Infrastructure: N2T and ARKsJohn Kunze
 
ARK identifiers: lessons learnt at BnF: paths forward
ARK identifiers: lessons learnt at BnF: paths forwardARK identifiers: lessons learnt at BnF: paths forward
ARK identifiers: lessons learnt at BnF: paths forwardJohn Kunze
 
DataONE Preservation and Metadata Working Group Report 2014
DataONE Preservation and Metadata Working Group Report 2014DataONE Preservation and Metadata Working Group Report 2014
DataONE Preservation and Metadata Working Group Report 2014John Kunze
 
Selected Bash shell tricks from Camp CDL breakout group
Selected Bash shell tricks from Camp CDL breakout groupSelected Bash shell tricks from Camp CDL breakout group
Selected Bash shell tricks from Camp CDL breakout groupJohn Kunze
 
Library Tools Supporting Data-Rich Research
Library Tools Supporting Data-Rich ResearchLibrary Tools Supporting Data-Rich Research
Library Tools Supporting Data-Rich ResearchJohn Kunze
 
Big Data's Long Tail
Big Data's Long TailBig Data's Long Tail
Big Data's Long TailJohn Kunze
 
Future-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do TodayFuture-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do TodayJohn Kunze
 
The ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years OldThe ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years OldJohn Kunze
 
New Metaphors: Data Papers and Data Citations
New Metaphors: Data Papers and Data CitationsNew Metaphors: Data Papers and Data Citations
New Metaphors: Data Papers and Data CitationsJohn Kunze
 
Pairtrees for object storage
Pairtrees for object storagePairtrees for object storage
Pairtrees for object storageJohn Kunze
 
The BagIt file package format
The BagIt file package formatThe BagIt file package format
The BagIt file package formatJohn Kunze
 

More from John Kunze (19)

The YAMZ Metadictionary
The YAMZ MetadictionaryThe YAMZ Metadictionary
The YAMZ Metadictionary
 
YAMZ Metadata Vocabulary Builder
YAMZ Metadata Vocabulary BuilderYAMZ Metadata Vocabulary Builder
YAMZ Metadata Vocabulary Builder
 
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
 
EZID and N2T at CDL
EZID and N2T at CDLEZID and N2T at CDL
EZID and N2T at CDL
 
YAMZ.net: better, faster, cheaper taxonomy building
YAMZ.net:  better, faster, cheaper taxonomy buildingYAMZ.net:  better, faster, cheaper taxonomy building
YAMZ.net: better, faster, cheaper taxonomy building
 
A Vocabulary for Persistence
A Vocabulary for PersistenceA Vocabulary for Persistence
A Vocabulary for Persistence
 
Identifiers obey Resolvers not Schemes
Identifiers obey Resolvers not SchemesIdentifiers obey Resolvers not Schemes
Identifiers obey Resolvers not Schemes
 
Names, Things, and Open Identifier Infrastructure: N2T and ARKs
Names, Things, and Open Identifier Infrastructure: N2T and ARKsNames, Things, and Open Identifier Infrastructure: N2T and ARKs
Names, Things, and Open Identifier Infrastructure: N2T and ARKs
 
ARK identifiers: lessons learnt at BnF: paths forward
ARK identifiers: lessons learnt at BnF: paths forwardARK identifiers: lessons learnt at BnF: paths forward
ARK identifiers: lessons learnt at BnF: paths forward
 
DataONE Preservation and Metadata Working Group Report 2014
DataONE Preservation and Metadata Working Group Report 2014DataONE Preservation and Metadata Working Group Report 2014
DataONE Preservation and Metadata Working Group Report 2014
 
Selected Bash shell tricks from Camp CDL breakout group
Selected Bash shell tricks from Camp CDL breakout groupSelected Bash shell tricks from Camp CDL breakout group
Selected Bash shell tricks from Camp CDL breakout group
 
Library Tools Supporting Data-Rich Research
Library Tools Supporting Data-Rich ResearchLibrary Tools Supporting Data-Rich Research
Library Tools Supporting Data-Rich Research
 
Big Data's Long Tail
Big Data's Long TailBig Data's Long Tail
Big Data's Long Tail
 
Pamwg 2012ahm
Pamwg 2012ahmPamwg 2012ahm
Pamwg 2012ahm
 
Future-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do TodayFuture-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do Today
 
The ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years OldThe ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years Old
 
New Metaphors: Data Papers and Data Citations
New Metaphors: Data Papers and Data CitationsNew Metaphors: Data Papers and Data Citations
New Metaphors: Data Papers and Data Citations
 
Pairtrees for object storage
Pairtrees for object storagePairtrees for object storage
Pairtrees for object storage
 
The BagIt file package format
The BagIt file package formatThe BagIt file package format
The BagIt file package format
 

Recently uploaded

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 

Recently uploaded (20)

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 

Scalable Identifiers for Natural History Collections

  • 1. Scalable Identifiers for Natural History Collections 12 August 2012 University of California Curation Center California Digital Library
  • 2. California Digital Library Serving the University of California • 10 campuses • 360K students, faculty, and staff • 100’s of museums, art CDL supports the research galleries, observatories, marine lifecycle centers, botanical gardens • Collections • 5 medical centers • Digital Special Collections • 5 law schools • Discovery & Delivery • 3 National Labs • Publishing Group • UC Curation Center (UC3)
  • 3. The research data problem an article about data, but no data
  • 4. What EZID data citation offers • Precise identification of a dataset (DOI, ARK) • Credit to data producers and data publishers • A link from traditional literature to the data • Exposure and research metrics for datasets (Web of Knowledge, Google)
  • 5. EZID: Long term identifiers made easy Take control of the management and distribution of your research, share and get credit for it, and build your reputation through its collection and documentation
  • 6. EZID: Long term identifiers made easy Take control of the management and distribution of your research, share and get credit for it, and build your reputation through its collection and documentation
  • 7. DataCite German National Library of Economics (ZBW) German National Library of Science and Technology (TIB) Canada Institute for Scientific and Technical Info. (CISTI) German National Library of Medicine (ZB MED) Technical Information Center of Denmark GESIS - Leibniz Institute for the Social Sciences, Germany Institute for Scientific & Technical Information (INIST- Australian National Data Service (ANDS) CNRS), France ETH Zurich, Switzerland TU Delft Library, The Netherlands The Swedish National Data Service (SNDS) The British Library , UK California Digital Library (CDL), USA Office of Scientific & Technical Information (OSTI), USA Purdue University Library
  • 8. EZID Clients A current, partial list UC Berkeley Library (on behalf of the UC Berkeley The Digital Archaeological Record (tDAR) campus) Sponsored accounts: Open Context Dryad Digital Repository CRCNS.org UC San Diego Library (on behalf of the UC San Diego Fred Hutchinson Cancer Research Center campus) American Astronomical Society(AAS) LabArchives Centre national de documentation National Center for Atmospheric Research (NCAR) pédagogique(CNDP) Cornell Institute for Social & Economic USGS/Earth Sciences Data Clearinghouse Research (formerly National Biological Info. Infrastructure)
  • 9. New features in development • Suffix pass-thru: do NT and get N/ST/S for free • Service replicas: manager and resolver • Content negotiation and inflections: ? ?? / . • URN (Uniform Resource Name) support (urn:uuid:) • ARK community and governance, eg, registries
  • 10. Some identifier dimensions • registration (storing and updating ids for resolution) • non-registration (id awareness via rules) • persistence flavors • resolution • clusters (closely coupled ids) • other relations (part, whole, related)
  • 11. Identifier generation • inspiration ("I think I'll call it MyKitty/Photos") • systematic inspiration (title/author/vol/issue) • counter (421, 422, 423, ...) • timestamp • hash computed over content (MD5, SHA256) • hash of randomized timestamp plus registry (uuidgen, noid) • randomized counter plus registry (EZID/noid)
  • 12. Identifier registration • use filesystem tree as resolver (any old website) • use web server config file • use web server backing database • use a service (bit.ly, EZID, DataCite, local Handle service)
  • 13. Identifier non-registration Identifiers “exposed” but not registered, eg, awareness via rules • extension (abc/def is "part of" abc) • parameter (abc_N_M works for N or M less than 100,000) • general query (arbitrary data cells)
  • 14. Identifier persistence flavors • persistent id to very dynamic content (eg, home page) • persistent id to stable but correctable content (eg, landing page) • persistent id to never-changing content (eg, spreadsheet) – persistent ids to non-recommended content • persistent id to stable but growing content (serial pub)
  • 15. Identifier resolution • DNS (domain names) • DNS + HTTP (any website) • DNS + HTTP + redirects (eg, URL shorteners, N2T/EZID system) • DNS + HTTP + redirects + Handle resolver (DOIs and Handles)
  • 16. Identifier clusters Related, but very closely couple identifiers • object files • alternate object files • object metadata
  • 17. GUID Definitions • GUID -- Definition 1 (wikipedia) – A 128-bit id generated per RFC 4122, eg, – uuidgen -> EEF45689-BBE5-4FB6-9E80- 41B78F6578E2 • GUID -- definition 2 (earth sciences?) – any globally unique identifier
  • 18. Service replicas • EZID is an id manager that populates N2T – It tolerates down time – Other id manager services might one day populate N2T • N2T (Name-to-Thing) is an id resolver that ... – It is very intolerant of down time, since it services all access requests for locations and metadata – N2T replicas underway
  • 19. URN support • N2T and EZID are agnostic about kinds of things, names, and metadata – Digital, physical, abstract, living, fictional, groups, etc. – Any metadata & known profiles (DataCite, Dublin Kernel) – ARK, DOI, URN, Handle, IVOA, LSID, PMID, etc., requiring namespace “write” permission, eg, via DataCite • In test: Uniform Resource Names (URNs) – urn:uuid namespace
  • 20. Under the hood keysmithing terms: bows, shoulders, blades, tips, covers
  • 21. Suffix pass-thru: NT gets N/ST/S for free Idea: if name N points to target T, then requests for N extended by any suffix N/S can take you to T/S • For dataset doi:10.5072/Big4 with 10,000 nameable components, – Register and manage 10,001 names or 1 name? – Eg, http://x.y.z/foo/Big4/db/table/cell/45-8.txt could be reached with doi:1.5072/Big4/table/cell/45-8.txt • In test with ARKs. Conflict with other resolvers?
  • 22. Tombstone and other surrogate pages Tombstone, incubation, and other surrogate pages (probation?) auto-generated from metadata, eg, http://n2t.net/ezid/tombstone/id/ark:/20775/bb3243444z
  • 23. Reserved identifiers and multiple targets • Some ids must be created and managed (reserved) before going public, eg, for manuscript preparation • In test: infrastructure for multiple targets and multiple instances of any metadata element • What should user experience be for multiple targets? – Present a menu of targets (burden of choice)? – One target chosen for them (burden of inflexibility)?
  • 24. Identifier (ARK) inflections: ? ?? / . • Inflect: change endings w.o. creating new words – Terminal ? means “I want metadata”, which is similar to linked data content negotiation (also in EZID test) – Terminal ?? means “I also want support metadata” – Drawing board: / could mean “I want a landing page” and . could mean “I want the usual computable thing” • Allow inflections beyond ARKs to DOIs/URNs?
  • 25. Example: http://n2t.net/ark:/13030/qt0349g1rh? Renninger, Heidi; Phillips, Nathan; Hodel, Donald. “Comparative hydraulic and anatomic properties in palm trees (Washingtoniarobusta) of varying heights”. 2009-04-29. ark:/13030/qt0349g1rh HTML content with embedded comments in ANVL/ERC and RDF erc: who: Renninger, Heidi,; Phillips, Nathan,; Hodel, Donald, what: Comparative hydraulic and anatomic properties in palm trees (Washingtoniarobusta) of varying heights when: 2009-04-29 where: ark:/13030/qt0349g1rh
  • 26. ARK community and governance • ARK mailing list: arks-forum@googlegroups.com • Topics: governance, community, standardization • Registry maintenance: shoulders and NAANs • N2T consortium with alternative EZID-like services

Editor's Notes

  1. Academic, non-profit, government, and commercial